ON THE EXCHANGE OF INTERSECTION AND SUPREMUM OF 
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Princeton University 

We construct a stationary Markov process with trivial tail a-field and a 
nondegenerate observation process such that the corresponding nonlinear fil- 
tering process is not uniquely ergodic. This settles in the negative a conjecture 
of the author in the ergodic theory of nonlinear filters arising from an erro- 
neous proof in the classic paper of H. Kunita (1971), wherein an exchange of 
intersection and supremum of a-fields is taken for granted. 



1. Introduction and main result. Let E and F be Polish spaces, and consider 
an E x F-valued stochastic process (Xk, Yk)k&L with the following properties: 

1. (Xk, Yk)k£Z is a stationary Markov process. 

2. There exist transition kernels P from E to E and $ from E to F such that 

P[(X n , Y n ) e A\X n _x, y n _!] = / l A (x, y) P(X n ^,dx) *(x, dy). 

Such a process is called a stationary hidden Markov model; its dependence struc- 
ture is illustrated schematically in Figure 1. In applications, (Xk)kez represents 
a "hidden" process which is not directly observable, while the observable process 
(Yfc)fc e z represents "noisy observations" of the hidden process [4]. 

Of fundamental importance in the theory of hidden Markov models is the non- 
linear filter (7Tfc)fc>o, defined as the regular conditional probability 



ir n = P[X n G .\Y u ...,Y ni 



That is, 7r n is the conditional distribution of the current state of the hidden process 
given the observations to date. It is a basic fact in this theory that the filtering 
process (nk)k>o is itself a Markov process taking values in the space 7(E) of 
probability measures on E, whose transition kernel l~l can be expressed in terms of 
the transition kernels P and that determine the model (this and other basic facts 
on nonlinear filters are reviewed in the appendix). 

Following Kunita [12], we will be interested in the structure of the space of 
ri-invariant probability measures in 7(7(E)). It is easily seen that for every V\- 
invariant measure m G 7 (7(E)), the barycenter fi 6 7(E) of m must be invariant 
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FIG 1. Dependence structure of a hidden Markov model. 



for the transition kernel P of the hidden process. Conversely, for every P-invariant 
measure \i 6 ^{E), there exists at least one ri-invariant measure m € < J > ('J , (E)) 
whose barycenter is [i. However, the latter need not be unique. 

THEOREM 1.1 (Kunita). Let P\Xq 6 •]:=// be the P-invariant measure 
defined by the stationary hidden Markov model (X^,, Y^gg as above. If 

(1-1) H (^"00,0 V y-oo,n) = ^-oo,0 P^, 

n<0 

there exists a unique V\-invariant measure with barycenter [i. The converse 
holds if in addition <I> possesses a transition density with respect to some a-finite 
reference measure. [Here := o~{Yk : k < 0}, Jf^,, := a{Xk : A; < n}.] 

Remark 1.2. Though the main ideas of the proof are implicitly contained in 
[12], this simple and general statement does not appear in the literature without var- 
ious additional simplifying assumptions. For completeness, and in order to make 
this paper self-contained, we therefore include the proof in the appendix. 

Theorem 1.1 is not actually stated as such by Kunita [12]. Instead, Kunita as- 
sumes that the hidden process (X^^z is purely nondeterministic: 

Definition 1.3. A stochastic process (X^kcz is called purely nondetermin- 
istic if its past tail cr-field f] n<Q 3~-oo n * s P _a - S - trivial. 

Kunita's main theorem states 1 that if the hidden process {X^^i is purely non- 
deterministic, then there exists a unique ri-invariant measure with barycenter \x. 
Kunita's proof, however, does not establish this claim. Indeed, at the crucial point 



In fact, Kunita's paper is written in the context of a continuous time model with white noise 
observations. None of these specific features are used in the proofs, however. 
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in the proof ([12], top of p. 384), Kunita implicitly takes for granted that the fol- 
lowing exchange of intersection and supremum is permitted: 

(1.2) fl (<£«,,„ V ^ 00 , n ) = V f| ^ n P-a.s. 

n<0 n<0 

If this exchange were justified, then Kunita's result would indeed follow immedi- 
ately from Theorem 1.1. However, in general, such an exchange of intersection and 
supremum is not permitted, as will be shown in section 1 . 1 below. 

The goal of this paper is to settle, in the negative, a natural conjecture on the 
validity of the identity (1.2). Before we can describe the conjecture, we must review 
what is known about the validity of (1.2) in the filtering setting. 

Remark 1.4. Beyond the relevance of (1.2) to filtering theory, the problem 
studied in this paper provides a case study on an enigmatic problem: when is 
the exchange of countable intersection and supremum of cr-fields permitted? Such 
problems arise in remarkably diverse areas of probability theory. The following 
references provide some further context on this general problem. 

1. Several distinguished mathematicians have given erroneous proofs related 
to the exchange of intersection and supremum of cr-fields, including Kol- 
mogorov (see [22], p. 837) and Wiener (see [15], pp. 91-93). 

2. A simple counterexample to the validity of the exchange of intersection and 
supremum due to Barlow and Perkins can be found in [31], p. 48. This ex- 
ample is closely related to the example given in section 1 . 1 below. See also 
[5], pp. 29-30 and the references therein. 

3. The exchange of intersection and supremum appears in diverse probabilistic 
settings: see [29], section 5 and the references therein for various exam- 
ples and counterexamples. In particular, the innovations problem and several 
variants of Tsirelson's celebrated counterexample provide a rich setting in 
which one can study the exchange of intersection and supremum problem; 
see [32, 9, 13, 2] and the references therein. See also [27] for a different 
connection to filtering theory. 

4. Von Weizsacker [29] gives a general necessary and sufficient condition for 
validity of the exchange of intersection and supremum, which is however 
often difficult to apply in practice. It is shown in [7] that the exchange of 
intersection and supremum is always valid in a given probability space if 
and only if its probability measure is purely atomic. 

1.1. A simple counterexample. The gap in Kunita's proof was discovered in 
[1], where a simple counterexample to (1.2) was given. The following variant of 
this example will be helpful in understanding our main result. 
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Let (Ck)kez be an i.i.d. sequence of (Bernoulli) random variables uniformly dis- 
tributed in {0, 1}. Let E = {0, 1} x {0, 1} and F = {0, 1}, and define the stochas- 
tic process (X k , Yk)k&L taking values in E x F as follows: 

X n = (£n— 1 ; £n ) i Y n = \Cn £n— 1 1 • 

It is evident that (X k , Y k )kez is a stationary hidden Markov model. Note that: 

• Clearly £o = (£ n _i + Y n + • • • + Y$) mod 2 for any n < 0. Therefore, 

oo^ V J-oo^n) -measurable. 

n<0 

• On the other hand, as P[£o = 0(3"^ ] = 1/2 by direct computation, 

£o is not 3"^oo Q-measurable P-a.s. 

• (Xk)keZ is purely nondeterministic by the Kolmogorov zero-one law. 
Therefore, evidently the identity ( 1 .2) does not hold in this example. 

1.2. A positive result and a conjecture. In view of the counterexample above, 
one might expect that the gap in Kunita's proof cannot be resolved in general. How- 
ever, it turns out that such counterexamples are extremely fragile. For example, let 
(7fc)fcez be an i.i.d. sequence of standard Gaussian random variables, and let us 
modify the observation model in the above example to 

Y n = \in ~ £n-l| + ^7n- 

Then it can be verified that for arbitrarily small e > 0, the identity (1.2) holds. It is 
only in the degenerate case e = that (1.2) fails. This suggests that the presence of 
some amount of noise, however small, is sufficient in order to ensure the validity 
of (1.2). This intuition can be made precise in a surprisingly general setting, which 
is established by the following result due to the author [25]. Here the notion of 
nondegeneracy formalizes the presence of observation noise. 

Definition 1.5. The hidden Markov model {X k ,Y k ) k£ z is said to possess 
nondegenerate observations if there exist a cr-finite reference measure ip on F and 
a strictly positive measurable function g : E x F — > ]0, oof such that 

$(x, A) = J l A (y) g(x, y) <f(dy) for all x e E, A e 

THEOREM 1.6 ([25]). Given a stationary hidden Markov model (X k ,Y k )k^z 
as defined in this section, with P -invariant measure P[Xq G ■]:=(!, assume that: 



EXCHANGE OF INTERSECTION AND SUPREMUM 



5 



1. The hidden process (Xk)kez is absolutely regular: 
(1.3) E[||P[*n G • |X ] - H| TV ] 0. 

2. The observations are nondegenerate. 
Then the identity (1.1) holds true. 

This result resolves the validity of (1.1) in many cases of interest. Indeed, the 
mixing assumption (1.3) holds in a very broad class of applications, and a well- 
established theory provides a powerful set of tools to verify this assumption [19]. 
Nonetheless, the assumption (1.3) is strictly stronger than the assumption that the 
hidden process is purely nondeterministic; the latter is equivalent to 

E[|P[X n G A\X ] - n(A)\] for all A G 15(E) 

(see [24], Proposition 3). If, as one might conjecture, nondegeneracy of the obser- 
vations suffices to justify the exchange of intersection and supremum (1.2), then 
Theorem 1.6 should already hold when the hidden process is only purely non- 
deterministic, i.e., Kunita's claim would hold true whenever the observations are 
nondegenerate. This stronger result was conjectured in [25], pp. 1877-1878. 

CONJECTURE 1.7. If the hidden process is purely nondeterministic and the 
observations are nondegenerate, then (1.1) holds true. 

Conjecture 1.7 seems tantalizingly close to Theorem 1.6, particularly if we 
rephrase (1.3) in terms of tail a-fields. Indeed, let P x be a version of the regu- 
lar conditional probability P x ° = P[- \Xq\. Then, from the results of [25], for 
example, one may read off the following equivalent formulation of (1.3): 

There exists a set Eq G S(E') such that h(Eq) = 1 and 

for all A G n„< ^-oo,n and x, y G E P X [A] = PV[A] G {0, 1}. 

On the other hand, clearly (X^kez is purely nondeterministic if and only if 

For any A G f] n<0 9~-oc,m there exists Eq G 15(E) (depending possibly 
on A) such that p,(E ) = 1 and for all x,y G E P X [A] = P y [A] G {0, 1}. 

Thus the difference between the assumptions is that in the latter, the set Eq may 
depend on A, while in the former Eq does not depend on A. 
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1.3. Main result. The main result of this paper is that Conjecture 1.7 is false. 
We establish this by exhibiting a counterexample. 

THEOREM 1.8. There exists a stationary hidden Markov model (XkjY^kez m 
a Polish state space E x F such that the hidden process is purely nondeterministic 
and the observations are nondegenerate, but nonetheless (1.1) fails to hold. 

Moreover, this model may be constructed such that the transition kernel P of 
the hidden process is Feller, and such that the observations are of standard additive 
noise type Y n = h(X n )+e^ n where h : E — > R 3 is a bounded continuous function, 
e > and {^k)k& are standard Gaussian random variables in K 3 . 

The counterexample to Conjecture 1.7, whose existence is guaranteed by this 
result, must surely yield a nasty filtering problem! Yet, Theorem 1.8 indicates the 
model need not even be too nasty: the example can be chosen to satisfy standard 
regularity assumptions and using a perfectly ordinary observation model. It there- 
fore seems doubtful that the general result of Theorem 1.6 can be substantially 
weakened; absolute regularity (1.3) is evidently essential. 

Let us briefly explain the intuition behind the counterexample. We aim to mimic 
the noiseless counterexample in section 1.1. The idea is to construct a variant of that 
model which has very long memory: we can then hope to average out the additional 
observation noise (needed to make the observations nondegenerate), reverting es- 
sentially to the noiseless case. On the other hand, we cannot give the process such 
long memory that it ceases to be purely nondeterministic. The following construc- 
tion strikes a balance between these competing goals. We reconsider the example 
of section 1.1 not as a time series, but as a random scenery. We then construct 
a stochastic process by running a random walk on the integers, and reporting at 
each time the value of the scenery at the current location of the walk. The resulting 
random walk in random scenery [8, 11] is purely nondeterministic, yet has a very 
long memory due to the recurrence of the random walk. The latter is exploited by 
a remarkable scenery reconstruction result of Matzinger and Rolles [16] which al- 
lows us to average out the observation noise. Theorem 1.8 follows essentially by 
combining the scenery reconstruction with the example of section 1.1, except that 
we must work in a slightly larger state space for technical reasons. 

Remark 1.9. Random walks in random scenery are closely related to the 
T, T _1 -process, which was conjectured by Weiss ([30], p. 682) and later proved 
by Kalikow [10] to be a natural example of a if-process that is not a 73-process. In 
the language of ergodic theory, a purely nondeterministic process is a K -process 
[20] while a process that satisfies (1.1) is an -relative K -process [21]. Our 
example may thus be interpreted as a K -process that is not K relative to a nonde- 
generate observation process. The absolute regularity property (1.3) is equivalent 



EXCHANGE OF INTERSECTION AND SUPREMUM 



7 



to the weak Bernoulli property in ergodic theory (cf. [28]). 

We end this section with a brief discussion of the practical implications of The- 
orem 1.8. The mixing assumption (1.3) required by Theorem 1.6 states that the 
law of the hidden process converges in the sense of total variation to the invari- 
ant measure \i for almost every initial condition. This occurs in a wide variety of 
applications [19], as long as the hidden state space E is finite dimensional. In in- 
finite dimensions, however, most probability measures are mutually singular and 
total variation convergence is rare. When the hidden process is defined by the so- 
lution of a stochastic partial differential equation, for example, typically the best 
we can hope for is weak convergence to the invariant measure. In this case (1.3) 
fails, though the process is still purely nondeterministic. Our main result indicates 
that nice ergodic properties of the nonlinear filter cannot be taken for granted in 
the infinite dimensional setting. This is unfortunate, as infinite dimensional filter- 
ing problems appear naturally in important applications such as weather prediction 
and geophysical or oceanographic data assimilation (see, e.g., [14]), while ergodic- 
ity of the nonlinear filter is essential to reliable performance of filtering algorithms 
[26]. The current state of knowledge on the ergodic theory of infinite dimensional 
filtering problems appears to be essentially nonexistent. 

The remainder of this paper is organized as follows. In section 2 we introduce 
the various stochastic processes needed to construct our counterexample. Sections 
3 and 4 are devoted to the proof of Theorem 1.8. The appendix reviews the ergodic 
theory of nonlinear filters (including a proof of Theorem 1.1). 

2. Construction. In the following, we will work on the canonical probability 
space ($7, 3~, P) which supports the following independent random variables. 

• (%)fc e z> £o are 1 i d. random variables, uniformly distributed in {0, 1, 2}. 

• (<5fc)fcGZ are i-i-d- random variables, uniformly distributed in {—1, 1}. 

• (7fe)fcez are i-i-d- standard Gaussian random variables in M 3 . 

Denote by (e(0), e(l), e(2)} C R 3 the canonical basis in M 3 . 

We now proceed to define various stochastic processes. Define recursively 

£ = { ^ n_1 + ^ mod 3 for n > °' 

n \ (£ n+ i - rj n+1 ) mod 3 for n < 0. 

Note that (Ck)kez is an i.i.d. sequence uniformly distributed in {0, 1, 2}, and 

Vn = (£n - £n-l) mod 3. 
Next, we define the simple random walk (Nk)kez on Z as 

N = { ELi h for n > 0, 

n \ - El=n+i h for n < 0. 
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We can now define the random walk in random scenery (Z k )kez which takes values 
in the set {-1, 1} x {0, 1, 2} x {0, 1, 2} := / as follows: 

Z n = (Z n fl, Z nj i, Z n ^) = ($n+l i £,N n — li £,N n ) • 

It is not difficult to see that (Z n ) n& z is a stationary process. We finally make the 
process Markovian by defining the I z + -valued process (X n ) n& z as 

X n = {Z k ) k > n (that is, X n ^ k = Z n+k for k £ Z+), 

and we define the IR 3 -valued observation process (Yfc)fcez as 

Y n = h(X n ) + £7 n = e(r] Nn ) + e-y n , 

where e > is a fixed constant and h : I z+ —> M 3 is defined as 

h(x) = e((xo,2 — x o,i) m °d 3). 

It is evident that the pair (X n , Y n ) n& i defines a stationary hidden Markov model 
taking values in the Polish space I z + x M 3 and with nondegenerate observations. 
Let us define the <r-fields 

?m,n = a { x k ■ke[rn,n}}, "S Y m n = a{Y k : k E [m, n]}, 

for m, n G Z, m < n. The cr-fields n , 3"^ oo> etc -> we defined in the usual 
fashion (for example, S^oo n = V m <n n)- Our main result is now as follows. 

THEOREM 2.1. For the hidden Markov model (X k ,Y k ) ke x with nondegener- 
ate observations, as defined in this section, the following hold: 

1. The future tail a-field 

T := S'^oo w P-a.s. trivial. 

n>0 

2. We have the strict inclusion 

?^>0 

provided that e > is chosen sufficiently small. 

The proof of this result, given in section 3 below, is based on mixing and recon- 
struction results for random walks in random scenery [17, 16]. 

The model of Theorem 2.1 is time-reversed from the counterexample to be pro- 
vided by Theorem 1.8. It is immediate from the Markov property, however, that the 
time reversal of a stationary hidden Markov model yields again a stationary hidden 
Markov model. Therefore, the following corollary is immediate: 
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COROLLARY 2.2. For e > sufficiently small, the time-reversed model 

is purely nondeterministic and has nondegenerate observations, but (1.1) fails. 

This proves the first part of Theorem 1.8 and settles Conjecture 1.7. However, 
when constructed in this manner, the transition kernel of (X^k&z cannot be chosen 
to satisfy the Feller property on J z +. Some further effort is therefore required to 
complete the proof of Theorem 1.8, which we postpone to section 4. 

3. Proof of Theorem 2.1. 

3.1. First part. Consider the stochastic process £ n := (^ n _i,^ n ). It is easily 
seen that this is a stationary, irreducible and aperiodic Markov chain taking values 
in the space {0, 1, 2} x {0, 1, 2}, so that (ik)k& is an ergodic process. The triviality 
of 7 now follows from the Theorem in [17], p. 267 (this follows in particular from 
equation (3) in [17] using [23], Theorem 7.9). 

3.2. Second part. Consider the modified observation process (l^Ofcez taking 
values in {0, 1, 2}, defined as follows: 

Y„ = argmaxY nj i. 

i=0,l,2 

That is, is the coordinate index of the largest component of the vector Y n £ K 3 . 
By symmetry, it is easily seen that for some 6 > depending on e 

PK = i\VN n =j] = ^ VtVi, PK = i|w»=i] = l-y Vt, 

where S 4- as e | 0. The conditional law of Y^ can therefore be generated as 
follows: draw a Bernoulli random variable with parameter 5; if it is zero, set Y^ = 
r]N n , otherwise let Y^ be a random draw from the uniform distribution on {0, 1, 2}. 
We can now apply the scenery reconstruction result from [16]. 

DEFINITION 3.1. Let x,y € {0, 1,2} Z . We write x « y if there exist a € 
{— 1, 1} and 6 6 Z such that x n = y an +b f° r all n € Z (that is, x ~ y iff the 
sequences x and y agree up to translation and/or reflection). 

THEOREM 3.2 ([16]). There is a measurable map i : {0, 1, 2} z + ->• {0, 1, 2} z 
such that P[i((y fe ')fc>o) ~ (?7fc)fcgz] = 1 provided e > is sufficiently small. 
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From now on, let us fix e > sufficiently small and the map l as in Theorem 
3.2. By the definition of the equivalence relation «, there exist 3q ^ V >oc - 
measurable random variables ^4 and i?, taking values in {—1,1} and Z, respec- 
tively, such that i{{Yl)k>o) n = f]An+B P-a.s. for all n G Z. 

Remark 3.3. Let us note that, even though by construction (r]Ak+B)k&z is 
a.s. 3^^-measurable, it is not possible for the random variables A and B to be 
Sq'oo -measurable; see [11], Remark (ii). This will not be a problem for us. 

The point of the above construction is the following claim: the random variable 
£b is a.s. Clni^Ooo V 9~^oo) - mea surable, but it is not a.s. 3"^^ -measurable. This 
clearly suffices to prove the result. It thus remains to establish the claim. 

LEMMA 3.4. The random variable is P-a.s. Hn^o oo^-^n ^-measurable. 

PROOF. Fix n G Z. Define the random variables (rfc)fcgz as 

tj = inf jfc > : ^ X n,i,0 = J | , 
and define the random variables (^)feez as 

Then clearly (^)feez is -measurable and P[(^) fceZ (£ fc )fcez] = L 

We now claim that we can "align" (Cfc)fcez with (rjAk+B)kez- Indeed, note that 
for any b G Z, we can estimate 

P [f/fc = f?fc+fc for all k G Z] < P [t/o = for all k > 1] = 0, 

oo 

P [% = r/_ fc+fe for all fc G Z] < ]J P [rj k = rj_ k+b ] = 0, 

k=b 

where we have used that (r]k)kez are i.i.d. and nondeterministic. Therefore 

P [there exist a G {— 1,1}, 6 G Z such that r\k = i] a k+b for all k G Z] = 0. 

In particular, if we define (rj' k )kez as 

^- = «5--<5-_i)mod3, 

it follows that there must exist P-a.s. unique 3"^^ V S^^-measurable random 
variables A' and B' , taking values in {—1, 1} and Z, respectively, such that 

Va'j+b' = 7 Uj+B for all j G Z P-a.s. 
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It follows by uniqueness that 

Za'j+b' = Uj+B for all j € Z P-a.s. 

In particular, £' B , = £b P-a.s. But £' B , is 3"^oo V 3\;f ^-measurable by construction. 
Therefore, we have shown that £b is P-a.s. S^^V^^-measurable. As the choice 
of n was arbitrary, the proof is easily completed. □ 

Lemma 3.5. The random variable is not P-a.s. 3^ ^-measurable. 

Proof. Note that P-a.s. 

p[£b = i,B = il^^ v St^ v 21^ J 

= ~^-B=j = i\3^L oo,oo V 3~-oo,oo V 3^oo,oo] 

= l fl=i {P[£ = il^oo,^ V V 5l 00j J o G J } 

= lB =i P[& = i]. 

Here we have used that is 3"^^ V S^oo ^-measurable for the first equality, 
stationarity of the law of Sk,jk)k&z for the second equality (0 denotes the 

canonical shift), and independence of £0 and 5k,lk)k& f° r the third equality. 
Summing over j, and conditioning on 3~jf oo> we obtain 

V[tB = i\?loo] = P[6 = i] = 1/3 P-a.s. 
Thus is independent from Sq^, hence not P-a.s. ^^-measurable. □ 

Remark 3.6. The additive noise model Y n = h(X n ) + ej n is inessential to 
the proof; we could have just as easily started from the {0, 1, 2}-valued observation 
model Y£ as in [16]. The only reason we have chosen to construct our example with 
the additive noise model is to make the point that there is nothing special about the 
choice of observations: one does not have to "cook up" a complicated observation 
model to make the counterexample work. All the unpleasantness arises from the 
ergodic theory of random walks in random scenery. 

4. Proof of Theorem 1.8. For any x £ define 

Tj(x) = inf fk > : ^£i,o = jj . 

Now define the space 

E := ix E I Z+ : Tj(x) < 00 for all j € z| C I Z+ . 
We endow E with the topology of point wise convergence (inherited from 
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Lemma 4.1. E is Polish. 

PROOF. For x, x' G E, define the metric 

oo oo 

d(x, X '):=^22- k l x ^ x , k + 2"^{|r i (x)-r J (x')|Al}. 
k=0 j=—oo 

It suffices to prove that d metrizes the topology of pointwise convergence in E 
(which is certainly separable) and that (E, d) is a complete metric space. 

We first prove that d metrizes the topology of pointwise convergence. Clearly 
d(x n ,x) — y as tl — y oo implies that x n — > x pointwise. Conversely, suppose 
that x n — y x as n — y oo pointwise. It suffices to show that Tj(x n ) — y Tj(x) as 
n — y oo for all j G Z. But as Tj(x) < oo by assumption (as x € E), it follows 
that Tj(x n ) = Tj(x) whenever x n ^ = x^ for all k < Tj(x), which is the case for n 
sufficiently large by pointwise convergence. This establishes the claim. 

It remains to show that (E, d) is complete. To this end, let (x n ) nS M be a Cauchy 
sequence for the metric d. Then it is clearly Cauchy for 

oo 

d(x,x') :=J2 2 ' kl ^' k , 

k=0 

which defines a complete metric for the topology of pointwise convergence on 
jZ+ 3 e Therefore, there exists x E I z + such that x n —y x as n — > oo pointwise. 
It suffices to show that x £ E. Indeed, when this is the case, it follows immedi- 
ately that d(x n ,x) —y as n — y oo (as we have shown that d metrizes pointwise 
convergence in E), thus proving completeness of (E, d). 

To complete the proof, suppose that x E. Then there exists j G Z such that 
Tj(x) = oo. In particular, if x n ^ = for all k < N < oo, then Tj(x n ) > N. As 
this is the case for n sufficiently large by pointwise convergence, it follows that 

sup d(x m ,x n ) > 2 _ ' J ' sup \Tj(x m ) — Tj(x n )\ A 1 = 2~' J ' for all n > 1. 

m>n m>n 

This contradicts the Cauchy property of (x n ) ne ^. □ 

Denote by P[Xq € •]:=// the invariant measure of the I z+ -valued Markov 
process (X^^x defined in section 2. It is clear that E is measurable as a subset of 
I z + and that (J,(E) = 1. We are going to construct a Feller transition kernel P from 
E to E with stationary measure \i (restricted to E), such that the corresponding 
stationary E- valued Markov process coincides a.s. with the stationary I z + -valued 
Markov process (X^^x defined in section 2. 
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LEMMA 4.2. Define the transition kernel P : E x 'B(E) — > [0, 1] as follows: 
P(x,{T 1 (x)})=P(x,{T_ 1 (x)}) = ± 
where T a : E — > E, a € {— 1, 1} are defined as 

T a (x) = [(o,x r _ o ( a! ) j i,a; r _ a ( a .) ) 2),a;]. 

Then the law under P of the process (Xk)kez defined in section 2 is that of a sta- 
tionary Markov process taking values in E with transition kernel P and invariant 
measure p,. Moreover, P satisfies the Feller property. 

PROOF. It follows along the lines of the proof of Lemma 4.1 that the functions 
T\ and T_i are continuous. Therefore, the Feller property of P is immediate. 

To complete the proof, it suffices (as clearly X n G E P-a.s. for all n G Z and 
as (Xk)k& is a stationary Markov process) to show that 

P[Xi £A\X }= P(X , A) P-a.s. for all A G H(E). 

To this end, note that 

X\ = [(*0,C-4b-l,f-« )>^o] = [(*0,^ ,T_ 4o (je o ),l''^'o,T_j (Xo),2)'^o] P " a " S - 

Moreover, as Xq is ^ V l\ ^-measurable, it follows from the construction in 
section 2 that 5q is independent of Xq. The result follows directly. □ 

Proof of Theorem 1.8. Construct the canonical E x M 3 -valued stationary 
hidden Markov model (X' k ,Y^,)^z sucn that the hidden process (X' k )kez has tran- 
sition kernel P and invariant measure X' ~ p., and with the observation model 
= h{X' n ) + £7„ where (7fc)fcez is an i.i.d. sequence of standard Gaussian ran- 
dom variables in R 3 independent of (^l)fcez- Clearly E and M 3 are Polish by 
Lemma 4. 1 , the observations are nondegenerate, h : E — > M 3 (defined in section 
2) is bounded and continuous, and P is Feller by Lemma 4.2. Moreover, the law of 
the model (XL Y^k&L coincides with that of (X^, Yfc)fcez as defined in section 2. 
Therefore, by Corollary 2.2, (X' k )k^z is purely nondeterministic but (1.1) fails for 
this model when e > is chosen sufficiently small. □ 

APPENDIX A: ERGODIC THEORY OF NONLINEAR FILTERS 

The goal of the appendix is to collect a few basic results on the ergodic theory of 
nonlinear filters. Similar results appear in various forms in the literature, see, for ex- 
ample, [3, 6] and the references therein. However, all known proofs require various 
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simplifying assumptions, such as the Feller property or irreducibility of the hid- 
den process, nondegenerate observations, etc. As a general result does not appear 
to be readily available in the literature, we provide here a largely self-contained 
treatment culminating in the proof of Theorem 1.1. 

Let us note that analogous results can be obtained for continuous time, either by 
direct arguments (cf. [33]) or by reduction to discrete time (as in [25]). 

A.l. Markov property of the filter. As in the introduction, we let E and F be 

Polish spaces, let P : E x ¥>(E) -»• [0, 1] and $ : E x 'B(F) ->• [0, 1] be the tran- 
sition kernels, and let \i : H(E) — > [0, 1] be the P-invariant measure defining the 
law of the stationary hidden Markov model (Xk, Yk)keZ- We denote by 7(G) the 
space of probability measures on the Polish space G, endowed with the topology 
of weak convergence of probability measures. 

Lemma A.l ([18], Lemma 1). For v G 7(E), define the probabUity measure 



P V (A)= l A (x,y)u(dx')P(x',dx)$(x,dy) for all A 6 "B(E x F). 



Denote by X : E x F E and Y : E x F — > F the canonical projections. There 
exists a measurable map IT : 7(E) xF-} 7(E) such that H(v, Y) is a version of 
the regular conditional probability P V (X € • \ Y) for every v € 7(E). 

We now define the transition kernel fl : 7(E) x r B(7(E)) -> [0, 1] as follows: 



We claim that the nonlinear filter (irk)k>o is a 7(E)-va\ued Markov process with 
transition kernel To prove this we will need the following result on conditioning 
under a regular conditional probability due to von Weizsacker. 

LEMMA A. 2 ([29]). Let G, G' and H be Polish spaces, and denote by g, g' and 
h the canonical projections from G x G' x H on G, G' and H, respectively. Let Q 
be a probability measure on G xG' x H, and let q. . : GxG' x "B(H) — > [0, 1] and 
q. : G x S(G' x H) — > [0, 1] be versions of the regular conditional probabilities 
Q[h € ■ \g,g'] and Q[(g',h) € • \g\, respectively. Then for Q-a.e. x € G, the 
kernel q x ,g'[ - ]isa version of the regular conditional probability q x [h € • \g'\ 

PROPOSITION A. 3. For n > 0, let the nonlinear filter ir n be a version of the 
regular conditional probability P[X n € ■ \Y±, . . . , Y n ]. Then (^k)k>o is a 7(E)- 
valued Markov process with transition kernel fl and initial measure ttq ~ <L. 
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PROOF. Fix n > 1. It is easily seen that for any B G "B(E x F) 
P[(X n ,Y n )eB\Y 1 ,...,Y n _ l } = J l A (x,y)Tr n _ 1 (dx')P(x',dx)$(x,dy). 

Using Lemmas A.2 and A.l and uniqueness of regular conditional probabilities, 
we find the recursive formula 7r n = n(7r n _i, Y n ) P-a.s. It follows easily that 

P[TT Tl £A\Y 1 ,...,Y n _ 1 ] = n(ir n _ 1 ,A) P-a.s. for all A G < B(7(E)), 

completing the proof. □ 

We now establish the two elementary facts stated in the introduction. 

LEMMA A. 4. Let m £ 7 (7(E)) be any V\-invariant probability measure. Then 
the barycenter of m is a P -invariant probability measure. 

PROOF. Let m G 7(E) be the barycenter of m. By definition, 

m(A) = Jv(A)m(dv) = J v(A) I~l(z/, dv) m(dv') for A e'B(E). 

But note that / u(A) U(v',du) = E Pv ,[P v >(X G A\Y)} = J P{x,A)u'(dx) by 
the definition of n. It follows directly that mP = m, that is, m is P-invariant. □ 

Lemma A. 5. There is at least one W-invariant measure with barycenter \x. 

PROOF. For n G Z, let 7r n be a version of the regular conditional probability 
P[X n G • IS^oo n ]- Proceeding exactly as in the proof of Proposition A.3, we find 
that (TTk)kez is a J > (£')-valued Markov process with transition kernel n. But as the 
underlying hidden Markov model (Xk, Yk)tez is stationary, clearly (ir^kez is also 
stationary. Therefore, the law of ttq is a l~l -invariant measure, and its barycenter is 
fj, by the tower property of the conditional expectation. □ 

A.2. Proof of Theorem 1.1: sufficiency. The proof is essentially contained in 
Kunita [12], though we are careful here not to exploit any unnecessary assump- 
tions. The idea is to introduce a suitable randomization, which is most conve- 
niently done in a canonical probability model. To this end, define the Polish space 
tto = 7(E) x E x (E x F) n with the canonical projections ttlq : — > 7(E) and 
(with a slight abuse of notation) X : Sl — > E, (X k ,Y k ) k >i : Q, — > (E x F) N . 
Given m G 7(7(E)), we define a probability measure P m on with the finite 
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dimensional distributions 

P m ((m ,X 0j ...,X n ,Fi,...,F n ) G A) = 

J 1a(v, x ,... ,x n ,yi, ...,y n ) v(dx Q ) P(x ,dxi) ®(x 1 ,dy 1 ) ■ ■ ■ 

P(x n - 1 ,dx n ) $(x n , dy n ) m{dv). 

We now define for n > three distinguished nonlinear filters: 

ir?r :=P m [X n £ ■\Y 1 ,...,Y n ], 
< :=P m [X n G ■\m ,Y 1 ,...,Y n ], 
vr-- :=P m [X n e ■\m G ,X Q ,Y 1 ,...,Y n }. 

We now have the following easy result. Here 5^,8^ G ( J > (7(E)) are defined by 

5^{A) = I^a (as usual) and e^A) = J l SxeA /j,(dx). 

LEMMA A. 6. Let m G 7('S'(E)) be any probability measure with barycen- 
ter fj,. Then (7r™ m ) n >o, (^n) n >o, (7'"™ ax )n>o are 3 > (E)-valued Markov processes 
under P m with transition kernel l~l and initial measures 5^, m, e^, respectively. 

PROOF. The proof is identical to that of Proposition A.3. □ 

The following result completes the proof of sufficiency. 

Proposition A.7. Let p G N, let ft : E ->■ R, i = 1, . . . ,p be bounded 
measurable functions, and let k : K p —■ R oe convex. Define the bounded measur- 
able function F : J 5 (£ ; ) — > R cw F(i/) = k (J* v(dx), . . . , J f p (x) v(dx)). 
Finally, let m G CP(CP(i?)) »e any W-invariant measure with barycenter \i. Then 

E[«(E[/i(Xo)|5 , r oo ,o],...,E[/ p (Xo)|5r oo>0 ])] < / FWm(dv) 

< E [« (El/iTOIS-oco], • • • .El/p^oJIS-oco])] , 

where 9-oo,o := Hn^-ooO V -^-ocm)- ^ w particular, if (LI) holds, m coincides 
with the distinguished W-invariant measure defined in the proof of Lemma A.5. 

PROOF. Note that as k is convex, it is continuous, hence F is bounded and 
measurable. It is an immediate consequence of Jensen's inequality that 

E m [F« lin )] < E m [F«)] = f F{u) m{du) < E m [F(vrr x )] 
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for every n > 0, where we have used Lemma A.6 and the ri-invariance of m to 
obtain the middle equality. Using Lemma A.6 and the stationarity of (Xk, Yk)k&L 
under P, it is also easily seen that the laws of 7r™ in (/), vr™ ax (/) under P m coincide 
with the laws of B[f(X )\Y_ n+1 , Y ], B[f(X )\X. n , . . . , Y Q ] under 

P, respectively. We therefore have for every n > 

E[ K (E[/ 1 (X )|Jr n+1 , ],...,E[/ p (X )|J y : n+1 , ])] < I F{v) m {du) 

< E [k (E[/i(X )|S_ ni0 ], . . • ,E[/ p (X )|S_ n , ])] , 

where 9-n,o := 3~-oo o V 3~-oo -n anc ^ we nave usec ^ tne f act tnat 

E[/(Xo)|X_ n ,F_ n+1 ,...,F ] =E[/(X )|S_ n ,o] P-a.s. 

as 3~5 n+1 o V 3~- n +i o i s conditionally independent of _ n _i V n given 

X_ n . But as k is continuous, the equation display in the statement of the result 
follows by letting n — ► oo using the martingale convergence theorem. 

Now suppose that (1.1) holds, and denote by m be the distinguished ri-invariant 
measure obtained in the proof of Lemma A.5. Then we have evidently shown 
that J F(v) m(dv) = J F(u) mo(d^) for all functions F of the form F(u) = 
n (J fi(x) v(dx), . . . , J f p (x) v(dxy\ for any p, bounded measurable fx, . . . , f p 
and convex k. We claim that this class of functions is measure-determining, so we 
can conclude that m = m . To establish the claim, first note that by the Stone- 
Weierstrass theorem, any continuous function on ]R P can be approximated uni- 
formly on any compact set by the difference of convex functions. As fi, ■ ■ ■ , f p 
are bounded (hence take values in a compact subset of MP), it therefore suffices 
to assume that k is continuous rather than convex. Next, note that the indicator 
function 1a of any open subset A of E p can be obtained as the increasing limit 
of nonnegative continuous functions. It therefore suffices to assume that k is the 
indicator of an open subset of IR P . But any probability measure on a Polish space 
is regular, so it suffices to assume that k is the indicator function of a Borel subset 
of W. The proof is completed by an application of the Dynkin system lemma. □ 

A.3. Proof of Theorem 1.1: necessity. We will in fact prove necessity under 
a weaker assumption than stated in the theorem: the key assumption is 

(A.1) H ( 5 -oo,fe V ^-oo,n)=^Vn (^-oc.oV^n) P-a.8. VkN. 
n<0 n<0 

The assumption in the theorem that Q possesses a transition density only enters 
the proof inasmuch as it guarantees the validity (A.l). Let us note that the assump- 
tion of the theorem is itself weaker than nondegeneracy of the observations, as the 
transition density is not required to be strictly positive here. 
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LEMMA A. 8. Suppose there exists a a -finite reference measure ip on F and a 
transition density g : E x F — >• [0, oof such that $>(x, A) = f 1a(u) d( x , v) ^(dy) 
for all x <E E, A £ 15(F). Then the identity (A. 1) holds true. 

PROOF. It is easily seen that the assumption guarantees the existence of a prob- 
ability measure Q such that P < Q and "j\ k is independent of V 
under Q. Thus the identity in (A.l) holds Q-a.s., and therefore P-a.s. □ 

The proof is based on the following result. 

LEMMA A. 9. Suppose there exists a unique W-invariant measure with hary cen- 
ter p, and that assumption (A.l) holds. Then we have for every A G 15(E) 

P[X G A\f] n V JF*^)] = P [Xq G A\^ ] P-a.s. 

PROOF. Define the regular conditional probabilities 7r° = P[X k G • 13"^ k ] 
and nl = P[X k G • | fU^-co,* V ^oo,n% and denote by m , mi G 7(7(E)) 
the laws of 7Tq and 7Tq, respectively. Then m is the ri-invariant measure defined in 
the proof of Lemma A.5. We claim that mi is also ri-invariant. Indeed, this follows 
as a variant of Lemma A.2 (pp. 95-96 in [29]) and the assumption (A.l) imply 
that ir\ = II(7r^_ 1 , Y k ) P-a.s., so that (irl)kez is Markov with transition kernel 11, 
while (irDk^z is easily seen to be a stationary process. 

Clearly mo and mi both have barycenter p, so by assumption mo = mi. Thus 

E[(4(A) - n° k (A)f] = E^)) 2 ] - E[(4(A)f] = mi (F A ) ~ m (F A ) = 

for every A G H(E), where we defined Fa ■ v (->• (u(A)) 2 . It follows that tt^(A) = 
7Tq(j4) P-a.s. for every A G H(E), which completes the proof. □ 

To complete the proof, we require the following easy variant of Lemma A. 1 . 

LEMMA A. 10. For v G 7(E) and k G N, define the probability measure 

P u( A ) = J 1 a(xo, yi,-.., yk)v(dx )P(x ,dx 1 ) $(xi,dyi)-- ■ 

P(x k _ 1 ,dx k )^(x k ,dy k ) for A G %(E x F k ). 

Denote by X : E x F k — >• E and Y k : E x F k — >• F k the canonical projections. 
There exists a measurable map S fc : 7(E) x F k -> 7(E) such that Y> k (v, Y k ) is a 
version of the regular conditional probability P k (X G • \ Y k ) for every v G 7(E). 

We now complete the proof. 
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PROPOSITION A. 11. Suppose there exists a unique \l-invariant measure with 
barycenter jjl and that assumption (A.l) holds. Then (1.1) holds true. 

PROOF. As Uko^Jo V S£ ,P) is dense in L 1 ^^ V 3^ fi1 P), it 
suffices to show that for every k < and Z G L^fco v ^ko> P ) 

E[^|n„ (^00,0 V yfoo,n)] = P " a - S - 
However, for Z £ L 1 (3~^ V 3"^ ' P)> we nave by tne Markov property 
E[^in„ (^0 V 3*o in )] = E[E[Z\a{X k } V |f\ (ST^o V S*^)] • 

It therefore suffices to consider Z € L 1 (cr{X/ c } V STn, P). But note that the class 
of random variables {Z x Z Y : Z x G L°°{a{X k }, P), Z y G L°°(3^ , P)} is 
total in L 1 ((j{X/ c } V S^g, P)- Therefore, it suffices to show that 

P[X k G A|n n (^,0 V 3"^)] = P[X fc G Al^o] P-a-S. 

for all < and ^4 G 2 (£7). For k = 0, this follows directly from Lemma A.9. 

For < 0, we proceed as follows. Define 7r° and ir\ as in the proof of Lemma 
A.9. It is easily established using Lemma A.2 that 

P[X k G • l^o] = X k (Tr k ,Y k+1 , ...,Y ) P-a.s. 

Similarly, a variant of Lemma A.2 (pp. 95-96 in [29]) and (A.l) imply 

P[X fc G • |n„ (3^00,0 V =S*(7ri,y fc+ i,...,y ) P-a-s. 

But by Lemma A.9, applying the Dynkin system lemma with a countable gener- 
ating system, and using that (X k ,Y k ) ke z is stationary under P, it follows directly 
that 7r° = ir k P-a.s. This completes the proof. □ 
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